Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Audio visual joint action recognition based on key frame selection network

Tingxiu CHEN, Jianqin YIN

Journal of Computer Applications 2022, 42 (3): 731-735. DOI: 10.11772/j.issn.1001-9081.2021060995

Abstract （207）

HTML （9）

PDF （771KB）（71）

Save

In recent years， the action recognition of audio visual joint learning has received some attention. Whether in video （visual modality） or audio （auditory modality）， the occurrence of action is instantaneous， only the information in the time period of action can significantly express the action category. How to make better use of the significant expression information carried by the key frames of audio-visual modality is one of the problems to be solved in audio-visual action recognition. According to this problem， a key frame screening network KFIA-S was proposed. Though the linear temporal attention mechanism based on the full connected layer， different weights were given to the audio-visual information at different times， so as to screen the audio-visual features beneficial to video classification， reduce redundant information， suppress background interference information， and improve the accuracy of action recognition. The effect of different intensity of time attention on action recognition was studied. The experiments on ActivityNet dataset show that KFIA-S network achieves the SOTA （State-Of-The-Art） recognition accuracy， which proves the effectiveness of the proposed method.

Table and Figures | Reference | Related Articles | Metrics